Formal Structure of Sanskrit Text: Requirements Analysis for a Mechanical Sanskrit Processor
نویسنده
چکیده
We discuss the mathematical structure of various levels of representation of Sanskrit text in order to guide the design of computer aids aiming at useful processing of the digitalised Sanskrit corpus. Two main levels are identified, respectively called the linear and functional level. The design space of these two levels is sketched, and the computational implications of the main design choices are discussed. Current solutions to the problems of mechanical segmentation, tagging, and parsing of Sanskrit text are briefly surveyed in this light. An analysis of the requirements of relevant linguistic resources is provided, in view of justifying standards allowing inter-operability of computer tools. This paper does not attempt to provide definitive solutions to the representation of Sanskrit at the various levels. It should rather be considered as a survey of various choices, allowing an open discussion of such issues in a formally precise general framework.
منابع مشابه
Extending the core functionalities of Aṣṭādhyāyī 2.0
The paper describes new layers of linguistic annotation and explorative tools that were added to the project ‘Aṣṭādhyāyī 2.0’. These additions make it possible to execute complex research queries in the digital version of Pāṇini’s grammar with minimal knowledge both of Sanskrit and database query languages. In the project ‘Aṣṭādhyāyī 2.0’, we have developed a digital edition of Pāṇini’s grammar...
متن کاملSanskritTagger: A Stochastic Lexical and POS Tagger for Sanskrit
SanskritTagger is a stochastic tagger for unpreprocessed Sanskrit text. The tagger tokenises text with a Markov model and performs part-of-speech tagging with a Hidden Markov model. Parameters for these processes are estimated from a manually annotated corpus of currently about 1.500.000 words. The article sketches the tagging process, reports the results of tagging a few short passages of Sans...
متن کاملBuilding a Prototype Text to Speech for Sanskrit
This paper describes about the work done in building a prototype text to speech system for Sanskrit. A basic prototype text-tospeech is built using a simplified Sanskrit phone set, and employing a unit selection technique, where prerecorded sub-word units are concatenated to synthesize a sentence. We also discuss the issues involved in building a full-fledged text-to-speech for Sanskrit.
متن کاملAnalysis of Sanskrit Text: Parsing and Semantic Relations
In this paper, we are presenting our work towards building a dependency parser for Sanskrit language that uses deterministic finite automata(DFA) for morphological analysis and ’utsarga apavaada’ approach for relation analysis. A computational grammar based on the framework of Panini is being developed. A linguistic generalization for Verbal and Nominal database has been made and declensions ar...
متن کاملAnnotating and Analyzing the Aṣṭādhyāyī
The paper introduces the new research project ‘Aṣṭādhyāyī 2.0’ that aims at developing a digital edition of the Aṣṭādhyāyī Pāṇini’s nearly 2,500 years old grammar of Sanskrit, the ancient Indian language. For modern linguists this grammar is interesting for two reasons. First, its Western (re-)discovery in the 19th century had an enormous influence on contemporary linguistics. For example, the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008